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We study the crowding of near-extreme events in the time gaps between successive finishers 
in major international marathons. Naively, one might expect these gaps to become progressively 
larger for better-placing finishers. While such an increase does indeed occur from the middle of the 
finishing pack down to approximately 20 th place, the gaps saturate for the first 10-20 finishers. We 
give a probabilistic account of this feature. However, the data suggests that the gaps have a weak 
maximum around the 10 th place, a feature that seems to have a sociological origin. 

PACS numbers: 01.80.+b, 02.50.-r, 05.40.-a, 89. 75. Da 

It is fun to learn about sports statistics and discuss their implications among fellow sports fans. The existence of 
comprehensive web-based resources for sports statistics, whose easy availability was unimaginable just a few years 
ago, has perhaps helped promote such activities. In this note, we investigate one such statistic, namely, the finishing 
times of individual runners in major marathons Our main interest is in the dependence of the time gaps between 
successive finishers on finishing place. More precisely, let tk be the time of the fc th finisher. Then we wish to understand 
how the time gaps <?& = tk+i — tk depend on finishing place k. Because front runners are rare and potential race leaders 
are rarer still, the natural expectation is that the gaps between successive finishers should increase monotonically in 
moving from the middle of the pack towards the increasingly-rare front runners. However, the data show that the 
time gaps saturate to a constant value for sufficiently small k. We suggest that sociological factors may contribute to 
this anomaly in the gaps. 
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FIG. 1: Distribution of all finishing times (smoothed over a 20-point range for visual clarity) for the Boston, Chicago, and New 
York marathons, 2000-2007. Notice the peaks at 3 hours in all the data, the prominent peaks at 4 hours for the Chicago and 
New York marathons, and the secondary peaks at 3:10, 3:20, and 3:30 for the Chicago marathon. The dashed curve shows the 
distribution of Eq. (f4]), with parameter values as given in the text. The inset shows the data in the range of 2:08-2:45. 



The results presented here are based on data for finishing times in major international marathons that attract 
world-class entrants. These include Boston, Chicago, and New York from 2000-2007 (entire fields), as well as Berlin 
1992 and 1999-2007, Fukuoka 2006-2007, London 2001-2007, and Paris 2004, 2006-2007 (first 100 places for all 
non-US races). Data for other years in these non-US marathons is not readily available or corrupted, and some of the 
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data used in this work required corrections of a few obviously erroneous results. In these marathons, the winning time 
is in the range 2:05-2:10. For example, in Boston, Chicago, and New York, the course records are 2:07:14, 2:05:42, 
and 2:07:43, respectively, while the current world record, set by Haile Gebrselassie in the 2007 Berlin Marathon, is 
2:04:26. After the race winner, there is a trickle of fast finishers that gradually turns into a steady flow as the finish 
time approaches 3 hours. The main pack arrives in the range of 3-6 hours, with a decreasing stream of progressively 
slower stragglers. Thus one naturally anticipates the distribution of finish times shown in Fig. [T] 

Upon examining these distributions critically a number of curiosities can be seen. First, in spite of the data 
smoothing, there are visible peaks at just under 3 hours and 4 hours for all three marathons. For the Chicago 
marathon in particular, where the course is flat and well-suited for pacing, one can even discern secondary peaks 
near 3:10, 3:20, and 3:30 (Fig. [1]). The existence of such peaks suggests that the distribution of finish times in this 
range does not reflect a performance limit, but rather, the surmounting of a psychological barrier. Parenthetically, 
the apparent difference in the distributions for the Boston marathon (where challenging qualifying times exist), with 
the Chicago and New York marathons can be made to nearly disappear by plotting them in scaled units — namely, by 
making the abscissa the finish time divided by the average finish time for each set of 8 races. 
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FIG. 2: Distribution of the average time gap gk between the k th and (k + l) st finisher for: the US (o) and the European (+) 
marathons cited in the text. For the US marathons the first 10,000 gaps are shown, while the first 100 gaps are shown for the 
European marathons. The dashed line has a slope of —1, as given by Eq. ([5]). 

More interesting behavior, and the main point of this work, is the k dependence of the time gaps gk between 
successive finishers. We are particularly interested in these gaps for finishers near the front of the pack. Thus we 
restrict ourselves to the first 10,000 finishers in the US marathons. This threshold corresponds to finishing times 
of about 4 hours for Chicago and New York, and around 3:45 for Boston. By comparing with Fig. [H these time 
thresholds are prior to the peak of the finishing time distribution for Chicago and New York, and near the peak for 
Boston. For comparison, the average number of finishers over the last eight US marathons that we studied is 30,668 
for Chicago, 33,669 for New York, and 16,645 for Boston. For k > 10, 000, (<?&) begins increasing, corresponding to 
the lagging tail of the finishing time distribution. For the European data, we quote g(k) only to k = 100. 

Among the fastest finishers, the finish time distribution decays very slowly and is nearly constant for times less than 
2:30 (inset to Fig. [I}. For the marathons that we studied, the average time gap between consecutive finishers among 
the first 10 places is in the range of 20-60 seconds, and do not have any clear systematic k dependence (Fig. [2]). 
Members of this group of elite runners are all possible candidates to win the race on any given day. In contrast, 
beyond the 20 th place, the average gap systematically decreases with k, a decrease that clearly reflects the increase 
in the density of runners as the leading edge of the pack arrives at the finish. 

We can make these observations quantitative by assuming that the finishing times of individual runners are inde- 
pendent and identically distributed (iid) random variables, and then using extreme-value statistics to determine the 
time gaps gt between successive finishers As a preliminary, consider the time of the k th finisher. The typical 

value for this time can be determined from the extremal condition (which assumes self averaging) 

J" P(t)db**±, (1) 
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that states that there are k individuals whose finishing times are less than tk- The resulting estimate for the typical 
k th finishing time tk should be accurate for k ^> 1, where fluctuations in t\. are negligible. More generally, we can 
compute the full probability distribution of tk , as outlined in appendix [A] and thereby find the mean value of tk to be 



(tk) 



I(P > (x);N-k+l,k) dx, 



(2) 



where I(y\ a, b) — [f» x a 1 (1 — x) b 1 dx] / [J Q x a 1 (1 — x) b 1 dx] is the regularized (in the sense that 7(1; a, b) = 1) 
incomplete Beta function and P>{x) = P(x') dx' is the exceedance probability. 

The main message from either the exact result in Eq. {2} or the extremal condition in Eq. ([1]) , is that the time gap 
fjk = tk+i — tk, has the following generic behaviors (see appendix IB1 for three simple examples): 



• If P(t) is constant, then (gk) is independent of k. 

• If P{t) increases monotonically as t increases, then (gk 



decreases monotonically as k increases. 
If P(t) decreases monotonically as t increases, then (gk) increases monotonically as k increases. 



Let us now apply the above results to marathon finishing times. As a trivial and artificial initial example, suppose 
that the marathon runners' speeds s are distributed exponentially, P(s) = s~ x e~ s / s * , with s* a characteristic running 
speed. Then the distribution of finishing times t = L/s would be 
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L p -T/t 
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(3) 



where T = L/s* is a typical finishing time for the field, and L is the course length. Applying the extremal criterion 
(TTJ) to this distribution gives the typical k th finishing time tk = T/[\n(N/k)]. While tk increases with k, as it must, 
this result has the unrealistic feature that the winning time approaches zero as the field becomes arbitrarily large. 

More plausibly, the finishing time distribution should incorporate a non-zero fastest time t m i n . A slightly more 
refined example that obeys this constraint is 



P(t) 
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(4) 



where r = (t — t nim ). The main new features of this distribution compared to Eq. ([3]) are the cutoff at t mm and the 
arbitrary exponent value m; the power-law prefactor is subdominant and it merely serves to simplify the calculations 
below. In fact, with the values t m i„ = 1.75 hours, T — 2.75 hours and m = 3, Eq. ((4]) roughly follows the data in 
Fig. [1] (dashed curve) . While one should not take the distribution and the parameter values too seriously, we will 
see that its precise form does not affect the behavior of the time gaps between successive finishers. 

Applying the extremal condition (TIJ to the distribution (QJ, and using the variable change x = (T/t)" 1 to simplify 
the resulting integral, the typical value of the time gap is 
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(5) 



This 1/k dependence holds for any distribution with an exponentially fast cutoff near the lower limit. The behavior 
gk oc 1/k accords well with the data beyond approximately 20 th place. However, contrary to the prediction of Eq. ([5]), 
the data clearly show that there is an "excess" of elite runners (Fig. \2§ , as the time gaps between successive finishers 
are roughly constant for the first 20 places. Moreover, for the US races, the gaps between the first few consecutive 
finishers actually decrease with k. As seen in Fig. [2] for US races, the largest gap occurs between 5 th and 6 th place. 

The reason that Eq. ([5]) does not capture the small-fc behavior seen in Fig. [5] is that the parent distribution in 
Eq. ([I]) quickly goes to zero close to the fastest finishing time t m [ n , whereas the actual distribution becomes nearly 
flat in this regime (Fig. []}. If we were to consider a flat distribution P(t), as suggested by the data shown in Fig. [U 
then a constant gap would be reproduced. The generic behavior of the dependence of the gap gk on k is discussed in 
appendix [Bj Along these lines, a recent theory [4[ predicts a crowding of runners near the front of marathon packs 
when the finishing time distribution is bounded from below. One additional feature of the gaps is that they begin 
to increase with k > 1000 (Fig. [2]). This behavior also arises from Eq. ([5]) for large k. This regime corresponds to 
finishing times of more than 4 hours and is not relevant for our main conclusions. 

Is there an explanation for having an excess of world-class runners? Many elite runners enjoy considerable incentives 
to maintain their competitive edge, including appearance money, access to the best support institutions (medical and 
athletic), etc. Thus if one achieves a time that qualifies as an elite performance, one is then in a position to take 
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advantage of the various inducements offered to leading runners to maintain such a status. However, runners at the 
next tier of achievement face a daunting challenge. To run a marathon in the range, say, of 2:15-2:30 (for men) is still 
an impressive achievement that requires significant talent, dedication, and time commitment. However, such a finish 
time is too slow to be competitive at major marathons. Thus runners who finish in this range typically have little 
or no external support for their athletic activities and have to balance this all-consuming endeavor with the need to 
survive economically. Consequently, one may even anticipate a deficit of male runners who can complete a marathon 
in the range of 2:15-2:30. Such a feature does actually occur in the Boston marathon. 

It would be valuable to study whether a similar excess of elite exists in different athletic events or other forms of 
human competition. It is also worth mentioning that perhaps a similar elite excess occurs in human mortality, where 
there is a well-known mortality plateau among the longest-lived individuals [!, B 0] ■ Here again, there seems to be 
a self-selected sub-population of advantaged individuals who gain advantage both innately and perhaps because of 
external reinforcement. 
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APPENDIX A: PROBABILITY DISTRIBUTION OF THE fc th FINISHING TIME 



For a set of N iid random times that are drawn from the same distribution P(t), let {ti,t2, ■ ■ ■ , i/v} denote their 
ordered set, with ti < t 2 < ■ ■ ■ < t^. Thus t\ denotes the winning time, t 2 denotes the 2 nd place time, and tk denotes 
the fc th fastest finishing time. 

The probability distribution of the k th fastest finishing time tk is given by 
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Equation (|A1[) merely specifies that (N — k) variables are greater than tk, (k — 1) variables are smaller than tk, and 
one variable equals tk- The combinatorial prefactor gives the number of such arrangements of these variables. In the 
second line, B(a, b) = T(a)r(b)/T(a + b) is the Beta function, and we have defined the exceedance probability 
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namely, the probability that a variable chosen from the initial distribution P exceeds x. This exceedance probability 
satisfies the obvious conditions P>(0) = 1 and P>(oo) = 0. One can easily check from Eq. (|A1|) that f{tk) is 
normalized, i.e., J f(tk) dtk = 1, as it must. 

The average value of the k th fastest finishing time is then 
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(A2) 



In the second line we have introduced / = I(y;a,b), the regularized incomplete Beta function, I(y;a,b) = 
B(y;a,b)/B(a.b), in which B(y;a,b) the incomplete Beta function 
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v e [o, l] , 



B(a,b) — B(l;a,b) the standard Beta function, and y = P > (x). Integrating Eq. (|A2[) by parts, and using the fact 
that the integrated term vanishes at both endpoints, gives the mean fc th finishing time expressed by Eq. ([2"|). 



APPENDIX B: (g k ) FOR THREE SIMPLE CASES 



In this appendix we calculate (gk) explicitly for three simple cases of P(t). 
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Case 1. For the uniform distribution P(t) = 1 in t G [0, 1] and P(t) = outside. Hence, P>(t) = l — t. Then 

(t fc ) = n f * • ^(i - «** = ^ " " + 1; " + 1} 



B(N - k + 1, ft) J v y B(N-k + l,k) 

= a4t ' < B1 > 

Thus we obtain (gk) = 1/(N + 1) for all k, while using the extremal condition Eq. ([1]) one finds the typical gap 
9k ~ 1/iV. As expected, (g^) is independent of k for a uniform distribution. 

Case 2. Consider the monotonically increasing distribution p(t) = It in t £ [0, 1]. Then P>(t) = l- t 2 . Hence, 

_ g(jv-fc + i,fc + i/2) _ r(jy + i)r(fc + 1/2) 

B(N - k + 1, ft) ~~ r(ft)r(iV + 3/2) ' 
From this exact calculation we find 

( gfc )= r^ + i) r(fc + i/2) j_ foraU7V and i . i for7V>fc>>1 (B3) 

W r(7V + 3/2) r(fe) 2ft' 2ViV Vk v ; 



(B2) 



Similarly using Eq. fTJ) the typical value of the fc th finishing time is t/. w y/k/N, and hence 

Vfc + 1 - Vk 



foriV>fc>l. (B4) 

2y/N Vk 

These results show that (gk) monotonically decreases as k increases. That is, the gap between successive variables 
gets smaller when their density increases, as one would expect. 

Case 3. Consider the monotonically decreasing distribution P(t) = exp(— t) where t £ [0, oo). In this case, 

(t k ) = — (°° x ■ e-( N - k >(l -e- x ) k - l dx= ; [\nz N - k (l-z) k - 1 dz. 

Kkl B(N-k + l,k)J V ' B(N-k + l,k)J 1 ' 

The latter integral can be found in Gradshteyn and Ryzhik [8j and the final result is 

(t k ) = ip(N + 1) - i>(N - ft + 1) , (B5) 

where ip( x ) — ^dx^ 1S ^ ne digamma function. Finally using the series representation 

n-l 



V>(n) = -7+y)-. (B6) 
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where 7 = 0.577215 ... is Euler's constant, we obtain 



{9k) = N^k' l<k<N-l. (B7) 
On the other hand using the extremal condition Eq. {1} one finds the typical value 

3fc «-log(l-^)«^, far *-*»!. (B8) 

Thus (gk) monotonically increases as k increases. Note that in this case the extremal condition Eq. ((T|) does not 
describe well the behavior when k is close to N. 
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